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Abstract 

We present techniques for maintaining subgraph frequencies in a dynamic graph, using data struc- 
tures that are parameterized in terms of h, the h-index of the graph. Our methods extend previous results 
of Eppstein and Spiro for maintaining statistics for undirected subgraphs of size three to directed sub- 
graphs and to subgraphs of size four. For the directed case, we provide a data structure to maintain 
counts for all 3-vertex induced subgraphs in 0(h) amortized time per update. For the undirected case, 
we maintain the counts of size-four subgraphs in 0(h 2 ) amortized time per update. These extensions 
enable a number of new applications in Bioinformatics and Social Networking research. 

1 Introduction 

Deriving inspiration from work done on fixed-parameter tractable algorithms for NP-hard problems (e.g., 
see EJ|7J|8][T7l|25l), the area of parameterized algorithm design involves defining numerical parameters for 
input instances, other than just the input size, and designing data structures and algorithms whose perfor- 
mance can be characterized in terms of those parameters. The goal, of course, is to find useful parameters 
and then design data structures and algorithms that are efficient for typical values of those parameters (e.g., 
see |[T2l [T3lO . In this paper, we are interested in extending previous applications of this approach in the 
context of dynamic subgraph statistics — where one maintains the counts of all (induced and non-induced) 
subgraphs of certain types — from undirected size-three subgraphs lfT3ll to applications involving directed 
size-three subgraphs and undirected subgraphs of size four. 
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Upon cursory examination this contribution may seem incremental, but these extensions allow for the 
possibility of significant computational improvement in several important applications. For instance, in 
bioinformatics, statistics involving the frequencies of certain small subgraphs, called graphlets, have been 
applied to protein-protein interaction networks ll22l l28l and cellular networks [27 ]. In these applications, 
the frequency statistics for the subgraphs of interest have direct bearing on biological network structure and 
function. In particular, in these graphlets applications, the undirected subgraphs of interest include one size- 
two subgraph (the 1-path), two size-three subgraphs (the 3-cycle and 2-path), and six size-four subgraphs 
(the 3-star, 3-path, triangle-plus-edge, 4-cycle, minus an edge, and K4), which we respectively illustrate 
later in Fig.|7]as Q A , Q 6 , Q 7 , Q 8 , Q9, and Q w . 

In addition, maintaining subgraph counts in a dynamic graph is of crucial importance to statisticians 
and social-networking researchers using the exponential random graph model (ERGM) lfT31 l29l l30l l33l to 
generate random graphs. ERGMs can be tailored to generate random graphs that possess specific properties, 
which makes ERGMs an ideal tool for Social Networking research |[33ll30ll . This tailoring is accomplished 
by a Markov Chain Monte Carlo (MCMC) method IT301 , which generates random graphs via a sequence of 
incremental changes. These incremental changes are accepted or rejected based on the values of subgraph 
statistics, which must be computed exactly for each incremental change in order to facilitate acceptance or 
rejection. Thus, there is a need for dynamic graph statistics in ERGM applications. 

Typical graph attributes of interest in ERGM applications include the frequencies of undirected stars and 
triangles, which are used in the triad model [16] to study friends-of-friends relationships, as well as other 
more-complex subgraphs OTTl . including undirected 4-cycles and two-triangles (if 4 minus an edge), and 
directed transitive triangles, which we illustrate as graph Tg in Fig. [3] Therefore, there is a salient need for 
algorithms to maintain subgraph statistics in a dynamic graph involving directed subgraphs of size three and 
undirected subgraphs of size four. 

Interestingly, extending the previous approach, of Eppstein and Spiro 1131 . for maintaining undirected 
size-three subgraphs to these new contexts involves overcoming some algorithmic challenges. The previous 
approach uses a parameterized algorithm design framework for counting three- vertex induced subgraphs in 
a dynamic undirected graph. Their data structure has running time 0(h) amortized time per graph update 
(assuming constant-time hash table lookups), where h is the largest integer such that there exists h vertices 
of degree at least h, which is a parameter known as the h-index of the graph. This parameter was introduced 
by Hirsch |[T8l as a combined way of measuring productivity and impact in the academic achievements of 
researchers. In spite of its drawbacks for this purpose HI, it is a useful parameter for dynamic graph algo- 
rithms, as demonstrated by Eppstein and Spiro. As we will show, extending the approach of Eppstein and 
Spiro to directed subgraphs of size three and undirected subgraphs of size four involves more than doubling 
the complexity of the algebraic expressions and supporting data structures needed. Ensuring the directed 
size-three procedure maintains the complexity bounds of previous work required extensive understanding 
of dynamic graph composition. Developing the approach for size-four subgraphs that would allow only the 
addition of a single factor of h required innovative work with the structure of stored graph elements. 

1.1 Other Related Work 

Although subgraph isomorphism is known to be NP-complete, it is solvable in polynomial time for small 
subgraphs. For example, all triangles and four-cycles can be found in an n-vertex graph with m edges in 
0(m 3//2 ) time lfT9l 0. All cycles up to length seven can be counted (but not listed) in 0(n u ) time Q, 
where ui ~ 2.376 is the exponent for the asymptotically fastest known matrix multiplication algorithm 0. 
In addition, fast matrix multiplication has also been used for other problems of finding and counting small 
cliques in graphs and hypergraphs ||9l[20l|23l[32l[34]l. Also, in planar graphs, the number of copies of any 
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fixed subgraph may be found in linear time iPlOl [TTj. These previous approaches run too slowly for the 
iterative nature of ERGM Markov Chain Monte Carlo simulations, however. 

1.2 Our Results 

In this paper, we present an extension of the /i-index parameterized data structure design from statistics for 
undirected subgraphs of size three to directed subgraphs of size three and undirected subgraphs of size four. 
We show that in a dynamic directed graph one can maintain the counts of all directed three- vertex subgraphs 
in 0(h) amortized time per update, and in a dynamic undirected graph one can maintain the four-vertex 
subgraph counts in 0(h 2 ) amortized time per update, assuming constant-time hash-table lookups (or worst- 
case amortized times that are a logarithmic factor larger). These results therefore provide techniques for 
application domains, in Bioinformatics and Social Networking, that can take advantage of these extended 
types of statistics. In addition, our data structures are based a number of novel insights into the combinatorial 
structure of these different types of subgraphs. 



2 Preliminaries 

As mentioned above, we define the h-index of a graph to be the largest h such that the graph contains h 
vertices of degree at least h. We define the /i-partition of a graph to be the sets (H, V \ H), where H is the 
set of vertices that form the /i-index. 

2.1 The H-Index 



4000 6000 
network size 



Figure 1: Scatter plot of h-index and network size from Eppstein and Spiro lPT4l 
It is easy to see that the /i-index of a graph with m edges is 0(y/m); hence it is 0(y/n) for sparse 

graphs with a linear number of edges, where n is the number of vertices. Moreover, this bound is optimal in 

the worst-case, e.g., for a graph consisting of yfn stars of size ^/ri each. As can be seen in Fig.[T]Eppstein 

and Spiro |[T3ll show experimentally that real-world social networks often have /i-indices much lower than 

the indicated worst-case bound. These indices, perhaps more easily viewed in log-log scale in Fig [2| were 

calculated on networks with a range of ten to just over ten-thousand nodes. The /i-index of these networks 

were consistently below forty with only a few exceptions, none greater than slightly above one-hundred. 

Moreover, many large real-world networks possess power laws, so that their number of vertices with degree 

d is proportional to nd~ x , for some constant A > 1. Such networks are said to be scale-free |[2l l2Tll24ll26l . 
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and it is often the case that the parameter A is between 2 and 3 in real-world networks. Note that the /i-index 
of a scale-free graph is h = 0(n 1 ^ 1+A ^), since it must satisfy the equation h = nh~ x . Thus, for instances of 
scale-free graphs with A between 2 and 3, an algorithmic performance of 0(h) is much better than the worst- 
case 0(y/n) bound for graphs without power-law degree distributions. For example, an O(h) time bound 
for a scale-free graph with A = 2 would give a bound of 0(n 1//3 ) while for A = 3 it would give an 0(n 1,/4 ) 
bound. Likewise, an algorithmic performance of 0(h 2 ) is much better than a worst-case performance of 
0(n) for these instances, for A = 2 would give a bound of 0(n 2 ^ 3 ) while for A = 3 it would give an 
0(n 1 / 2 ) bound. Thus, by taking a parametric algorithm design approach, we can, in these cases, achieve 
running times better than worst-case bounds characterized strictly in terms of the input size, n. 

2.2 Maintaining Undirected Size-3 Subgraph Statistics 

As mentioned above, Eppstein and Spiro lfX3Tl develop an algorithm for maintaining the /i-index and the 
/i-partition of a graph among edge insertions, edge deletions, and insertions/deletions of isolated vertices 
in constant time plus a constant number of dictionary operations per update. Observing that the /i-index 
doubles after Q(h 2 ) updates, Eppstein and Spiro further show a partitioning scheme requiring amortized 
0(1/ h) partition changes per graph update. This partitions the graph into sets of low- and high-degree 
vertices, which we summarize in Theorem |2.1[ 

Theorem 2.1 ([ 13]). For a dynamic graph G = (V, E), we can maintain a partition (H, V \ H) such that 
for v G H , degree(f;) = Q(h) and \H\ = 0(h); and for u G V \ H, degree(u) = 0(h) in constant time 
per update, with amortized 0(1 /h) changes to the partition per update. 

Using this partitioning scheme, one can develop a triangle-counting algorithm as follows. For each pair 
of vertices i and j, store the number of length- two paths P[i, j] that have an intermediate low-degree vertex. 
Whenever an edge (u, v) is added to the graph, increase the number of triangles by P[u, v], and update the 
number of length-two paths containing (u, v) in O(h) time. One can then iterate over all the high-degree 
vertices, adding to a triangle count when a high-degree vertex is adjacent to both u and v. Since there are 
O(h) high-degree vertices, this method takes O(h) time. These same steps can be done in reverse for an 
edge removal. 

Whenever the partition changes, one must update P[-, •] values to reflect vertices moving from high 
to low, or low to high, which requires 0(h 2 ) time. Since there are amortized 0(1/ h) partition changes 
per graph update, this updating takes 0(h) amortized time per update. The randomization comes from the 
choice of dictionary scheme used. The data structure as described requires 0(mh) space, which is sufficient 
to store the length-two paths with an intermediate low-degree vertex. 
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Figure 2: Scatter plot of h-index and network size, on log-log scale from Eppstein and Spiro lTT4l 
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Finally, to maintain counts of all induced undirected subgraphs on three vertices, it suffices to solve a 
simple four-by-four system of linear equations relating induced subgraphs and non-induced subgraphs. This 
allows one to keep counts of the induced subgraphs of every type with a constant amount of work in addition 
to counting triangles. Extending this to directed subgraphs of size three and undirected subgraphs of size 
four requires that we come up with a much larger system of equations, which characterize the combinatorial 
relationships between such types of subgraphs. 

3 Directed Three-Vertex Induced Subgraphs 

Using the partitioning scheme detailed in Theorem |2.1| we can maintain counts for the all possible induced 
subgraphs on three vertices (see Fig. [5]) in 0(h) amortized time per update for a dynamic directed graph. 
We begin by maintaining counts for induced subgraphs that are a directed triangle, we then show how to 
maintain counts of all induced subgraphs on three vertices. 



o o o 
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o -o o==o 0=0 

0000000 




o o o -o o==o o==o 0=0 o==o 0=0 

Figure 3: The 16 possible directed graphs on three vertices, excluding isomorphisms, organized in left-to- 
right order by number of edges in the graph. We label these graphs To to T15. 

3.1 Counting Directed Triangles 

Let a directed triangle be a three- vertex directed graph with at least one directed edge between each pair of 
vertices. There are seven possible directed triangles, labeled Dq to Dq in Fig. [4} We let dk denote the count 
of induced directed triangles of type in the dynamic graph. We now show how to maintain each count 
di by extending Eppstein and Spiro^s technique. 0000 




o- o o -o 0=0 0=0 0=0 0=0 0=0 



Figure 4: The 7 directed triangles, labeled Dq to Dq. 

When an edge (it, v) is added or removed from the graph, we would like to quickly compute the number 
of directed triangles containing (it, v), in order to update the current counts. The third vertex of this directed 
triangle can either be low- or high-degree. We handle these cases separately. 

For a pair of vertices i and j, we define a. joint to be a third vertex I that is adjacent to both i and j. 
Vertices i, I and j are said to form an elbow. Fixing a vertex to be a joint, there are nine unique elbows 
which we label Eq to Ek(see Fig. [5]>. We store a dictionary mapping pairs of vertices i and j to the number 
of elbows of type formed by % and j and a low-degree joint, denoted [i, j] . 

We now discuss how the directed triangle counts change when adding an edge (u, v). We do not discuss 
edge removal since its effects are symmetric to edge insertion. 
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Figure 5: The nine elbows with a fixed joint. 

For directed triangles with a third low-degree vertex, we update our counts using the dictionary of elbow 
counts. If edge (v, u) is not in the graph, directed triangle counts increase as follows. 
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If edge (v, u) is present in the graph, adding (u, v) destroys some directed triangles containing (v, u). 
Therefore, the directed triangle counts change as follows. 

do = do — e\[v, u] 

d\ = d\- (e [v,u] + e 2 [v,u] + e 3 [v,u}) 
d 2 =d 2 + {e [u,v} +ei[u,v\) - (e 5 [v,u] +e 7 [v,u]) 
d 3 = d 3 + e 3 [u,v] - e 4 [v,u] 
di = di + e 2 [u, v] — ee [v, u] 

d 5 =d 5 + (e 4 [u, v] + e 5 [u,v] + e 6 [u,v] + e 7 [u,v]) - eg[v,u] 
do = d 6 + eg [u, v] 

To complete the directed triangle counting step, we iterate over the 0{h) high-degree vertices to account 
for directed triangles formed with u and v and a high-degree vertex, taking 0(h) time. 

If either u or v is a low-degree vertex, we must also update the elbow counts involving the added edge 
(u, v). We consider, without loss of generality, the updates when u is considered the low-degree elbow joint. 
For ease of notation, we categorize the different relationships between adjacent vertices as follows: 



irmeighbor(u) = {w G V 
outneighbor(u) = {w G V 
ncighbor(u) = {w G V 



(w, u) G E A (u, w) g E} 
(u,w) G E A (w,u) E} 
(u, w) G E A (w, u) G E}. 



We summarize the elbow count updates in Table [T] 

Finally, when there is a partition change, we must update the elbow counts. If node w moves across the 
partition, then we consider all pairs of neighbors of w and update their elbow counts appropriately. Since 
there are 0(h 2 ) pairs of neighbors, and a constant number of elbows, this step takes 0(h 2 ) time. Since 
0(1/ h) amortized partition changes occur with each graph update, this step requires 0(h) amortized time. 



3.2 Subgraph Multiplicity 

Let the count for induced subgraph Xi be called ti. Furthermore, for a vertex v, let i(v ) = |inneighbor(f )|, 
o(v) = |outneighbor(u)| andr(v) = |neighbor(u )|. We can represent the relationship between the number 
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of induced and non-induced subgraphs using the matrix equation 
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On the right hand side, each n» is the count of the number of non-induced Tj subgraphs in the dynamic 
graph. Each rii (excluding directed triangle counts) is maintained in constant time per update by storing 
a constant amount of structural information at each node, such as indegree, outdegree, and reciprocity of 
neighbors. On the left hand side, position i, j in the matrix counts how many non-induced subgraphs of 
type Ti appear in Tj. We are counting non-induced subgraphs in two ways: (1) by counting the number of 
appearances within induced subgraphs and (2) by using the structure of the graph. Since the multiplicand is 
an upper (unit) triangular matrix, this matrix equation is easily solved, yielding the induced subgraph counts. 
Thus, we can maintain the counts for three- vertex induced subgraphs in a directed dynamic graph in 0(h) 
amortized time per update, and 0(mh) space, plus the additional overhead for the choice of dictionary. 



4 Four- Vertex Subgraphs 

We begin by describing the data structure for our algorithm. It will be necessary to maintain the counts of 
various subgraph structures. The data structure in whole consists of the following information: 

• Counts of the non-induced subgraph structures, 7713 through mio. 

Table 1: Summary of updating elbow counts when u is considered a low-degree joint. 







(v,u) G E 


w G inneighbor(-u) \ {v} 


eo[w, v] = eo[w,v] + 1 
ei[v, w] = ei[v, w] + 1 


sq[w, v] = ee[w, v ] + 1 
e 5 [v, w] = e 5 [v,w] + 1 


w £ outneighbor(u) \ {v} 


e [v,w} = eo[v,w] + 1 
ei[w, v] = ei[w, v] + 1 


e^v, w] = e4[v, w] + 1 
e-j[w, v] = ej[w, v ] + 1 


w G neighbor(-u) \ {v} 


e4[t«,t;] = e4[io,v] + 1 
ei[v, w] = ei[v,w] + 1 


es[w, v] = es[w, v ] + 1 
e$ [v, w] = eg [v , w] + 1 
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• A set E of the edges in the graph, indexed such that given a pair of endpoints there is a constant-time 
lookup to determine if they are linked by an edge. 

• A partition of the vertices of the graph into two sets H and V\H. 

• A dictionary Pi mapping each vertex u to a pair P\[u] = (sq[u\, s\[u\). This pair contains the counts 
for the structures So and Si that involve vertex u ( see Fig. [6]). That is, the count of the number of 
two-edge paths that begin at u and pass through two vertices in V \ H and the number of these paths 
that connect back to u forming a triangle. We only maintain nonzero values for these numbers in P\ ; 
if there is no entry in Pi [u] for the vertex u then there exist no such path from u. 

• A dictionary P2 mapping each pair of vertices u, v to a tuple P2 ["!■*, v] = {s2[u, v], ss{u, v], Si{u, v], 
ss{u, v], sq[u, v]). This tuple contains the counts for the structures S2 through Sq that involve vertices 
u and v ( see Fig. [6]>. That is, the number of two-edge paths from u to v via a vertex of V \ H, the 
number of three-edge paths from u to v via two vertices of V \ H, the number of structures in which 
both u and v connect to the same vertex in V \ H which connects to another vertex in V \ H, the 
number of structures similar to the last in which the final vertex in V \ H shares an edge connection 
with u or v, and the number of structures where between u and v there are two two-edge paths through 
vertices of V\H in which the two vertices in V \H share an edge connection. Again, we only maintain 
nonzero values. 

• A dictionary P3 mapping each triple of vertices u, v , w to a number P2 [u, v, w] = (s7[u, v, w]). This 
value is the count for the structure Sj that involves vertices u, v, and w ( see Fig. [6]). This is, the 
number of vertices in V \ H that share edge connections with all three vertices. As before, we only 
maintain nonzero values for these numbers. 

Upon insertion of an edge between vertices v\ and V2 we will need to update the dictionaries P\, P2, 
and P3. If both v\ and V2 are in H, no update is necessary. 

If v\ and V2 are both in V \ H then we will need to update the counts sq through sq. First find which 
vertices in H connect to v\ or to V2. Increment so for these vertices. If both vertices in V \ H connect to 
the same vertex in H then increment s\ for this vertex. Increment S2 for v\ and the vertices that connect 
to V2, and for V2 and the vertices that connect to v\. Then increment S3 based on pairs of neighbors of v\ 
and V2 and neighbors of neighbors in V \ H. If either v\ or V2 connect to two vertices in H increment S4 
for the vertices in H. Considering v\ to be the vertex with edge connections to two vertices in H, for each 
vertex in H that connects to V2 increment S5. For two vertices in H such that v\ and V2 each connect to 
both, increment sq for the vertices in H. 

If vi and V2 are such that one is in V \ H and the other in H we will proceed as follows. Consider v\ to 
be the vertex in V \ H. First, determine the number of vertices in V \ H connected to v\ and increase sq for 




Figure 6: We store counts of these eight non- induced subgraphs to maintain counts of four- vertex non- 
induced subgraphs Q3 to Qio- The counts are indexed by the labels of the white vertices, and the blue 
vertices indicate a vertex has low-degree. 
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Figure 7: The 11 possible graphs on four vertices, excluding isomorphisms, organized in left-to-right order 
by number of edges in the graph. 



V2 by that amount. Upon discovering these adjacent vertices in V \ H test their connection to V2- For each of 
those connected, increment si for V2- It is necessary to determine which vertices in H share an edge with v\. 
After these connections have been determined increment the appropriate dictionary entries. Form pairs with 
V2 and the connected vertices in H and update the S2 counts. Form triples with V2 and two other connected 
vertices in H and update the counts in s-j. The s§ update comes from determining the triangles formed by 
the additional edge and using the degree of the vertices in H, and the count of the connected triangles, which 
can be calculated by searching for attached vertex pairs in H and using S2- In order to update the count for 
sq begin with location of vertex pairs as with the elbow update. For each of the H vertex pairs increase the 
stored value by the number of vertices in V \ H that share an edge with v\ and with both of the vertices in 
H , which can be retrieved from ,S2- 

Examining the time complexity we can see that in order to generate the dictionary updates the most 
complex operation involves examination of two sets of connected vertices consecutively that are O(h) in 
size each. This results in 0(h 2 ) operations to determine which updates are necessary. Since it is possible to 
see from the structure of the stored items that no single edge insertion can result in more than 0(h 2 ) new 
structures, this will be the upper bound on dictionary updates, and make 0(h 2 ) the time complexity bound. 

These maintained counts will have to be modified when the vertex partition is updated. If a vertex is 
moved from H to V \ H then it is necessary to count the connected structures it now forms. This can be 
done by examining all edges formed by this vertex, and following the procedure for edge additions. When a 
vertex is moved into H it is necessary to count the structures it had been forming as a vertex in V \ H and 
decrement the appropriate counts. This can be done similarly to the method for generating new structures. 
In analysis of the partition updates we see that since we are working with a single vertex with O(h) degree 
the complexity has an additional 0(h) factor to use the edge-based dictionary update scheme. This results 
in 0(/i 3 ) time per update. Since this partition update is done an average of 0(1/ h) times per operation, the 
amortized time for updates, per change to the input graph, is 0(h 2 ). 

4.1 Subgraph Structure Counts 

The following section covers the update of the subgraph structure counts after an edge between vertices v\ 
and V2 has been inserted. Let these vertices have degree count d\ and d,2 respectively. Recall that mi refers 
to the count of the non-induced subgraph of the structure Qi (see Fig. [7]). 

The 7713 count will be increased by (m — (d\ + d,2 — 2)), where m is the number of edges in the graph. 
Since this structure consists of two edges that do not share vertices, the increase of the count comes from a 
selection of a second edge to be paired with the inserted edge. The second term in the update value reflects 
the number of edges that connected to the inserted edge. 
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The count will be updated as follows. Each of the two vertices can be the end of a claw structure. 
From each end two edges in addition to the newly inserted edge must be selected. Thus the value to update 
the count is C^ 1 " 1 ) + ( d2 " 1 ). 

The 7715 count is updated by calculating the number of additional triangles the edge addition would add, 
which can be done with the Eppstein-Spiro lfl3ll method, and multiplying that by a factor of (re — 3) to reflect 
the selection of the additional vertex, where n is the number of vertices in the graph. 

The update for m§ is done in parts based on which position in the structure the edge is forming. The 
increase to the count for the new structures in which the additional edge is the center in the three-edge path 
is ((di - l)(da - I))- 

This value will be increased by the count when the new edge is not the center of the structure. The 
process to calculate the count increase will assume that v% connects to the rest of this structure. The same 
process can be done without loss of generality with the assumption V2 connects to the rest of the structure. 
These values will then both be added to form the final part of the count update. If v\ is an element of H then 
we will sum the results from the following subcases. First we consider the case where the vertex adjacent to 
vi is in H. The number of these paths of length two originating at v\ can be counted by summing the degree 
of these vertices minus 1. We must also subtract one for each of the adjacent vertices in H that are adjacent 
to V2- If V\ is not an element of H, then it has h or less neighbors. Sum over all neighbors the following 
value. If the vertex does not have an edge connecting it to V2 then the degree of the vertex; if it does the 
degree minus one. 

The mj count is updated as follows. An inserted edge can form the structure in three positions, so our 
final update will be the sum of those three counts. For the first case let the inserted edge be the additional 
edge connected to the triangle. For this case, we must do all of the following for both vertices and sum the 
result. If the vertex is in H retrieve s\. This gives us the connected triangles through vertices in V \ H. 
Then determine which vertices in H connect to the vertex. Form the triangle counts with all vertices in H. 
Form those with one additional vertex in H using S2- If the vertex is in V \ H, then determine its neighbors 
connections and form a connected triangle count. 

In the second case the edge is in the triangle and shares a vertex with the additional edge. The count can 
be determined in two parts. First the triangles. If either v\ or V2 are in V \ H then the triangle count can 
be calculated. If both are in H then a lookup to S2 will determine the number of triangles. The number of 
additional edges can then be calculated using the degrees of the vertices of the inserted edge, with care to 
not count the edges used to form the triangle. The product of the triangle and additional edge will form the 
increase for this case. 

The final case occurs when the inserted edge is part of the triangle, but does not share a vertex with the 
additional edge. If either v\ or V2 are in V \ H then the triangle count can be calculated, and the degree 
of the vertices used to form these triangles can be used to calculate the count increase. If both v\ or V2 are 
in H then there are three remaining subcases. The count if all vertices are in H can be determined. If the 
vertex on the additional edge that is not in the triangle is in H, then using the three known vertices in H and 
a lookup from P2 can yield the counts. If both remaining vertices are in V \ H this is the structure stored 
in S4, and counts can be retrieved. Sum the counts for these subcases to calculate the total increase for this 
case. 

The count for m% is increased upon edge update by a sum of the following. The count of the length three 
path through vertices in V \ H can be looked up in S3. There are two possible types of length three paths 
remaining. In the first, both vertices are in H. These paths can be counted be examining the connections 
between v%, V2, and all vertices in H. The second contains one vertex in H and one in V \ H. These paths 
can be counted by establishing which vertices in H connect to either v\ or V2, and then using the count in 
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S2 of the length two paths from the vertices in H to V2 or v\ respectively. 

The m% count can be increased by an edge insert in two positions. The first is between the opposite 
ends of the cycle. If either v\ or V2 is in V \ H then the edge connections can be determined and the count 
calculated. If both v\ and V2 are in H then the count of the two two-edge paths that form the cycle must be 
determined. These paths will either pass through a vertex in H or a vertex in V \ H. The former can be 
counted by examining the vertices in H, and the latter by a lookup to S2- 

The second possible position for an edge insert is on the outer path of a cycle that already has an edge 
through it. If either v\ or V2 are in V \ H calculate the count as follows, summing with an additional 
calculation considering the vertices reversed. If the vertex connected to the triangle is in V \ H then the 
count can be determined by examining neighbors and their edge connections. If the vertex not connected to 
the triangle is in V \ H then examine the neighbors. For those neighbors that are in V \ H the count can 
be determined by examining additional edge connections of neighbors. For the neighbors in H a lookup S2 
is required to completely determine the counts. If both v\ and V2 are in H then the count is calculated as 
follows. If all vertices of the structure are in H, determine the count by examining edge connections. If both 
remaining vertices are in V \ H the count can be determined by lookup to 55. Otherwise, one of the two 
remaining vertices is in H. This will leave a structure that can be completed and provide a count by using a 
lookup to S2, or S7 

The mio count update is separated by the membership of v\ and V2- If either vertex is contained in 
V\H, consider v\, then it is possible to determine which vertices connect to v\ and which of these share 
edges with V2 and each other. This count can be calculated and the total count can be updated. If both v\ 
and V2 are in H then we will sum the values determined in the following three subcases. First, all four 
vertices are in H. This count can be determined by examining the edge connections of the vertices in H. If 
three vertices in H form the correct structure, the count of cliques formed with one vertex in V \ H can be 
determined by a look up to s-j. These counts should be summed for all vertices in H that form the correct 
structure with v\ and V2. The final count, with both of the remaining vertices in V \ H can be determined 
by an s% lookup. 

The time complexity for the updates of the stored subgraphs is 0(h 2 ). Calculations and lookups can 
be performed in constant time, and subcase calculations can be done independently. The most complicated 
subcase count computations involve examination of two sets of connected vertices consecutively that are 
0(h) in size each. This results in 0{h?) operations. The space complexity for our data structure is 0(1) 
for the maintained subgraph counts, 0(m) for E, 0(n) for the partition to maintain H, and 0(mh 2 ) for the 
dictionaries, because each edge belongs to at most 0(h 2 ) subgraph structures. 

4.2 Subgraph Multiplicity 

The data structure in the previous section only maintains counts of certain subgraph structures. With the 
addition of m, n, and the count of length two paths, where m is the number of edges and n the number of 
vertices, it is possible to use these counts to determine the counts of all subgraphs on four vertices. The 
additional values m, n, and the count of length two paths can be maintained in constant time per update. 
Values for m and n are modified incrementally. Adding an edge uv will increase the count of length two 
paths by d u + d v , the degrees of u and v respectively. Removing the edge will decrease the value by 
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Similar to the matrix for size three subgraphs, 
the right and the composition of the induced subgr 
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can use the counts of the non-induced subgraphs on 
s to determine the counts of any desired subgraph. 



5 Conclusion 

The work we present here can maintain counts for all 3-vertex directed subgraphs O(h) amortized time per 
update. This can be done in 0(mh) space. For the undirected case, we maintain counts of size-four sub- 
graphs in 0(h 2 ) amortized time per update and 0(mh 2 ) space. Although we do not discuss the specifics 
in this paper, the methodology presented can be used to count directed size-four subgraphs with similar 
complexity. These developments open significant possibility for improvement in calculating graphlet fre- 
quencies within Bioinformatics and in ERGM applications for social network analysis. 
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